Identification of compositionally distinct regions in genomes using the centroid method

نویسندگان

  • Issaac Rajan
  • Sarang Aravamuthan
  • Sharmila S. Mande
چکیده

MOTIVATION It is known that most genomic regions of special interest, e.g. horizontally acquired sequences, genomic islands, etc. have distinct word (m-mer) compositions. Most of the earlier work along this direction, addressed di- and tri-nucleotide compositions. We present an approach that can be applied to analyze compositions of any given word size. The method, called the centroid approach, can reveal compositionally distinct regions in genomic sequences for any given word size. RESULTS We applied our method to 50 bacterial genomes and demonstrated its ability to identify embedded sequences of varying lengths from distantly related organisms. We also investigated the genetic makeup of the regions identified as compositionally distinct by our method, for four organisms from our dataset. Pathogenicity island (PAI) components and genes encoding strain-specific proteins are all frequently seen to be constituents of these regions. AVAILABILITY Program is available on request from the authors. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy Centroid-Based Method Applied to Customer Requirements Ranking in Diba Fiberglass Company

The purpose of this study is to introduce an application of fuzzy centroid-based approach to ranking the customer requirements using QFD with competition considerations for Diba Fiberglass, an Iranian Company. The illustrated approach, not only focuses on the normal fuzzy numbers, but also considers the non-normal fuzzy numbers to capture the true customer requirements. To this end, first, we p...

متن کامل

Assessment of compositional heterogeneity within and between eukaryotic genomes.

Using large amounts of long genomic sequences, we studied the compositional patterns of eukaryotic genomes. We developed a simple measure, the compositional heterogeneity (or variability) index, to compare the differences in compositional heterogeneity between long genomic sequences. The index measures the average difference in GC content between two adjacent windows normalized by the standard ...

متن کامل

Investigating genomic structure using changept: A Bayesian segmentation model

Genomes are composed of a wide variety of elements with distinct roles and characteristics. Some of these elements are well-characterised functional components such as protein-coding exons. Other elements play regulatory or structural roles, encode functional non-protein-coding RNAs, or perform some other function yet to be characterised. Still others may have no functional importance, though t...

متن کامل

Identification and determination of different alkaloids from Atropa belladonna L. by Gas chromatography method

Background & Aim: A. belladonna (family: Solanaceae) is one of important pharmaceutical plants which contain tropane ‎alkaloids. Tropane alkaloids are distinct group of secondary metabolites of the ‎Solanaceae family.  The most important alkaloids of A. belladonna are atropine and hyoscine that are used extensively because of their medicinal properties. There...

متن کامل

Simple and Rapid Detection of Yersinia Pestis and Francisella Tularensis using Multiplex-PCR

Background: Yersinia pestis and Francisella tularensis cause plague and tularemia, which are known as diseases of the newborn and elderly, respectively. Immunological and culture-based detection methods of these bacteria are time-consuming, costly, complicated and require advanced equipment. We aimed to design and synthesize a gene structure as positive control for molecular detection of these ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 23 20  شماره 

صفحات  -

تاریخ انتشار 2007